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Abstract 

We study the problem of recovering a hidden community of cardinality K from an n x n 
symmetric data matrix A, where for distinct indices i,j, Aij ^ P if i,j both belong to the 
community and Aij ^ Q otherwise, for two known probability distributions P and Q depending 
on n. HP — Bern(p) and Q = Bern(g) with p > g, it reduces to the problem of finding 
a densely-connected A-subgraph planted in a large Erdds-Renyi graph; if P = 1) and 

Q = ^(0,1) with p > 0, it corresponds to the problem of locating a. K x K principal submatrix 
of elevated means in a large Gaussian random matrix. We focus on two types of asymptotic 
recovery guarantees as n —^ oo: (1) weak recovery: expected number of classification errors is 
o(A); (2) exact recovery: probability of classifying all indices correctly converges to one. Under 
mild assumptions on P and Q, and allowing the community size to scale sublinearly with n, 
we derive a set of sufficient conditions and a set of necessary conditions for recovery, which are 
asymptotically tight with sharp constants. The results hold in particular for the Gaussian case, 
and for the case of bounded log likelihood ratio, including the Bernoulli case whenever ^ and 
are bounded away from zero and infinity. An important algorithmic implication is that, 
whenever exact recovery is information theoretically possible, any algorithm that provides weak 
recovery when the community size is concentrated near K can be upgraded to achieve exact 
recovery in linear additional time by a simple voting procedure. 


1 Introduction 

Many modern datasets can be represented as networks with vertices denoting the objects and edges 
(sometimes weighted or labeled) encoding their pairwise interactions. An interesting problem is to 
identify a group of vertices with atypical interactions. In social network analysis, this group can 
be interpreted as a community with higher edge connectivities than the rest of the network; in 
microarray experiments, this group may correspond to a set of differentially expressed genes. To 
study this problem, we investigate the following probabilistic model considered in [18]. 

Definition 1 (Hidden Community Model). Let C* be drawn uniformly at random from all subsets 
of [n] of cardinality K. Given probability measures P and on a common measurable space, let 
A be an re X n symmetric matrix with zero diagonal where for all 1 < i < j < re, are mutually 
independent, and Ajj ~ P if i, j € C* and Ajj ~ Q otherwise. 

In this paper we assume that we only have access to pairwise information Ajj for distinct indices 
i and j whose distribution is either P oi Q depending on the community membership; no direct 
observation about the individual indices is available (hence the zero diagonal of A). Two choices 
of P and Q arising in many applications are the following: 
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Urbana-Champaign, Urbana, IL, {b-hajek,yihongwu}@illinois.edu. J. Xu is with the Simons Institute for the 
Theory of Computing, University of California, Berkeley, Berkeley, CA, jianiingxu@berkeley.edu. 
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• Bernoulli case: P = Bern(p) and Q = Bern(g') with p ^ q. When p > q, this coincides with 
the planted dense subgraph model studied in [32, 7, 12, 21, 33], which is also a special case 
of the general stochastic block model [26] with a single community. In this case, the data 
matrix A corresponds to the adjacency matrix of a graph, where two vertices are connected 
with probability p if both belong to the community C*, and with probability q otherwise. 
Since p > q, the subgraph induced by C* is likely to be denser than the rest of the graph. 

• Gaussian case: P = Af{p, 1) and Q = AA(0,1) with /r / 0. This corresponds to a symmetric 
version of the submatrix localization problem studied in [37, 30, 10, 9, 31, 12, 11].^ When 
^ > 0, the entries of A with row and column indices in C* have positive mean p except those 
on the diagonal, while the rest of the entries have zero mean. 

Given the data matrix A, the problem of interest is to accurately recover the underlying com¬ 
munity C*. The distributions P and Q as well as the community size K depend on the matrix size 
n in general. For simplicity we assume that these model parameters are known to the estimator. 
The only assumptions on the community size K we impose are that K/n \s bounded away from 
one, and, to avoid triviality, that K >2. Of particular interest is the case of K = o{n), where the 
community size grows sublinearly. 

We focus on the following two types of recovery guarantees.^ Let ^ G {0,1}" denote the indicator 
of the community such that supp(^) = C*. Let | G {0, !}”■ be an estimator. 

Definition 2 (Exact Recovery). Estimator ^ exactly recovers if, as n —>■ oo, P[^ 7 ^ ^] ~^ 0, where 
the probability is with respect to the randomness of ^ and A. 

Definition 3 (Weak Recovery). Estimator ^ weakly recovers ^ if, as n —>■ oo, dH{i,C) /K — >• 0 in 
probability, where dn denotes the Hamming distance. 

The existence of an estimator satisfying Definition 3 is equivalent to the existence of an estimator 
such that E[cij:f(^,^)] = o{K) (see Appendix A for a proof). Glearly, any estimator achieving exact 
recovery also achieves weak recovery; for bounded K, exact and weak recovery are equivalent. 

Intuitively, for a fixed network size n, as the community size K decreases, or the distributions P 
and Q get closer together, the recovery problem becomes harder. In this paper, we aim to address 
the following question: From an information-theoretic perspective, computational considerations 
aside, what are the fundamental limits of recovering the community? Specifically, we derive sharp 
necessary and sufficient conditions in terms of the model parameters under which the community 
can be exactly or weakly recovered. These results serve as benchmarks for evaluating practical 
algorithms and aid us in understanding the performance limits of polynomial-time algorithms. 

In addition to establishing information limits with sharp constants for general P and Q, we 
identify the following algorithmic connection between weak and exact recovery: If exact recovery 
is information-theoretically possible and there is an algorithm for weak recovery, then in linear 
additional time we can obtain exact recovery based on the weak recovery algorithm. This suggests 
that if the information limit of weak recovery can be obtained in polynomial time, then so can 
exact recovery; conversely, if there exists a computational barrier that separates the information 

^The previously studied submatrix localization model (also known as noisy biclustering) deals with submatrices 
whose row and column supports need not coincide and the noise matrix is asymmetric consisting of iid entries 
throughout. Here we focus on locating principal submatrices contaminated by a symmetric noise matrix. Additionally, 
we assume the diagonal does not carry any information. If instead we assume nonzero diagonal with An ~ Affp, 1) 
if i € C* and An jVfO, 1) if i ^ C*, the results in this paper carry over with minor modifications explained in 
Remark 11. 

^Exact and weak recovery are called strong consistency and weak consistency in [34], respectively. 
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limit and the performance of polynomial-time algorithms for exact recovery, then weak recovery 
also snffers from such a barrier. To establish the connection, we apply a two-step procedure: the 
first step uses an estimator capable of weak recovery, even in the presence of a slight mismatch 
between 1(7*1 and K, such as the maximum likelihood estimator (see Lemma 4); the second step 
cleans up the residual errors through a local voting procedure for each index. In order to ensure the 
first and second step are independent, we use a method which we call successive withholding. The 
method of successive withholding is to randomly partition the set of indices into a finite number 
of subsets. One at a time, one subset is withheld to produce a reduced set of indices, and an 
estimation algorithm is run on the reduced set of indices. The estimate obtained from the reduced 
set of indices is used to classify the indices in the withheld subset. The idea is to gain independence: 
the outcome of estimation based on the reduced set of indices is independent of the data between 
the withheld indices and the reduced set of indices, and the withheld subset is sufficiently small 
so that we can still obtain sharp constants. This method is mentioned in [14], and variations of it 
have been used in [14], [35], and [34]. 

1.1 Related Work 

Previous work has determined the information limits for exact recovery up to universal constant 
factors for some choices of P and Q. For the Bernoulli case, it is shown in [12] that if Kd{q\\p) — 
clogK —)• oo and Kd{p\\q) > clogn for some large constant c > 0, then exact recovery is achievable 
via the maximum likelihood estimator (MLE); conversely, if Kd{q\\p) < dlogK and Kd{p\\q) < 
c'logn for some small constant <7 > 0, then exact recovery is impossible for any algorithms. 
Similarly, for the Gaussian case, it is proved in [30] that if Kp^ > clogn, then exact recovery 
is achievable via the MLE; conversely, if Kp^ < c'iogn, exact recovery is impossible for any 
algorithms. To the best of our knowledge, there are only a few special cases where the information 
limits with sharp constants are known: 

• Bernoulli case with p = 1 and q = 1/2: It is widely known as the planted clique problem 

[27]. If K > 2(1 -|- e)log 2 n for any e > 0, exact recovery is achievable via the MLE; if 

K < 2(1 — e) log 2 n, then exact recovery is impossible. Despite an extensive research effort 
polynomial-time algorithms are only known to achieve exact recovery for K > Cy/n for any 
constant c > 0 [3, 19, 16, 6, 18]. 

• Bernoulli case with p = alogn/n and q = h\ogn/n for fixed a,b and K = pn for a fixed 

constant 0 < p < 1. The recent work [20] finds an explicit threshold p*{a,b), such that 

if p > p*{a,b), exact recovery is achievable in polynomial-time via semi-definite relaxations 
of the MLE with probability tending to one; if p < p*{a,b), any estimator fails to exactly 
recover the cluster with probability tending to one regardless of the computational costs. This 
conclusion is in sharp contrast to the computational barriers observed in the planted clique 
problem. 

• The paper of Butucea et al. [9] gives sharp results for a Gaussian submatrix recovery problem 
similar to the one considered here - see Remark 7 for details. 

While this paper focuses on information-theoretic limits, it complements other work investi¬ 
gating computationally efficient recovery procedures, such as convex relaxations [4, 5, 12, 20, 23], 
spectral methods [32], and message-passing algorithms [18, 33, 24, 22]. In particular, for both the 
Bernoulli and Gaussian cases: 

• if K = 0(n), a linear-time degree-thresholding algorithm achieves the information limit of 
weak recovery (see [22, Appendix A] and [24, Appendix A]); 
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• K = oj{n/ log n), whenever information-theoretically possible, exact recovery can be achieved 
in polynomial time using semi-definite programming [23]; 

• K > i;;^(l/(8e) -|- o(l)) for Gaussian case and K > ’^^^{p^p{a/b) + o(l)) for Bernoulli 
case,^ exact recovery can be attained in nearly linear time via message passing plus clean 
up [22, 24] whenever information-theoretically possible. 

However, it is an open problem whether any polynomial time can achieve the respective information 
limit of weak recovery for K = o(n), or exact recovery for K < i;;|^(l/(8e) — e) in the Gaussian 
case and for K < j^^{pBp{a/b) — e) in the Bernoulli case, for any fixed e > 0. 

The related work [33] studies weak recovery in the sparse regime of p = a/n, q = b/n, and 
K = nn. In the iterated limit where first n —>■ oo, and then k —>• 0 and a,b ^ oo, with A = 
fixed, it is shown that a local algorithm, namely local belief propagation, achieves weak recovery 
in linear time if Ae > 1 and conversely, if Ae < 1, no local algorithm can achieve weak recovery. 
Moreover, it is shown that for any A > 0, MLE achieves a recovery guarantee similar to weak 
recovery in Definition 3. In comparison, the sharp information limit for weak recovery identified in 
Corollary 1 below allows p, q and K to vary simultaneously with n as n —>■ oo. 

Finally, we briefly compare the results of this paper to those of [1] and [34] on the planted 
bisection model (also known as the binary symmetric stochastic block model), where the vertices 
are partitioned into two equal-sized communities. First, a necessary and sufficient condition for 
weak recovery and a necessary and sufficient condition for exact recovery are obtained in [34]. In 
this paper, sufficient and necessary conditions, (7) and (8) in Theorem 1, are presented separately. 
These conditions match up except right at the boundary; we do not determine whether recovery 
is possible exactly at the boundary. The result for exact recovery in [1] is similar in that regard. 
Perhaps future work, based on techniques from [34], can provide a more refined analysis for the 
recovery problem at the boundary. Secondly, when recovery is information theoretically possible for 
the planted bisection problem, efficient algorithms are shown to exist in [1] and [34]. In contrast, 
for detecting or recovering a single community whose size is sublinear in the network size, there can 
be a significant gap between what is information theoretically possible and what can be achieved 
by existing efficient algorithms (see [3, 8, 31, 21, 33]). We turn instead to the MLF for proof of 
optimal achievability. Finally, this paper covers both the Gaussian and Bernoulli case (and other 
distributions) in a unified framework without assuming that the community size scales linearly with 
the network size. 

Notation For any positive integer re, let [re] = {l,...,re}. For any set T C [re], let |r| denote 
its cardinality and denote its complement. We use standard big O notations, e.g., for any 
sequences {an} and {bn}, an = 0(&n) or an bn if there is an absolute constant c > 0 such 
that 1/c < anjbn < c. Let Binom(re,p) denote the binomial distribution with re trials and success 
probability p. Let D{P\\Q) = Ep[log^] denotes the Kullback-Leibler (KL) divergence between 
distributions P and Q. Let Bern(p) denote the Bernoulli distribution with mean p and d{p\\q) = 
D(Bern(p)||Bern(g)) = plog | -|-plog |, where p = 1— p. Logarithms are natural and we adopt the 
convention OlogO = 0. Let 4>(x) and Q{x) denote the cumulative distribution function (GDF) and 
complementary GDF of the standard normal distribution, respectively. 

®Here pBp{a/b) denotes a constant only depending on a/b. 
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2 Overview of Main Results 


2.1 Background on Mcixiniuni Likelihood Estimator and Assumptions 

Given the data matrix A, a sufficient statistic for estimating the community C* is the log likelihood 
ratio (LLR) matrix L G where Lij = log^(Aj) for i ^ j and La = 0. For S,T C [n], 

define 

e{S,T)= (1) 

{i<jy.(i,j)e(SxT)U(TxS) 

Let Cml denote the maximum likelihood estimation (MLE) of C*, given by: 

Cml = argmax{e(C', C) : IC] = K}, (2) 

Cc[n] 

which minimizes the error probability PjC ^ C*} because C* is equiprobable by assumption. 
Evaluating the MLE requires knowledge of K. Computation of the MLE is NP hard for general 
values of n and K because certifying the existence of a clique of a specihed size in an undirected 
graph, which is known to be an NP complete problem [29], can be reduced to computation of 
the MLE. Thus, evaluating the MLE in the worst case is deemed computationally intractable. It 
is worth noting that the optimal estimator that minimizes the expected number of misclassified 
indices (Hamming loss) is the bit-MAP decoder ^ = (^j), where = argmax^ 

Therefore, although the MLE is optimal for exact recovery, it need not be optimal for weak recovery; 
nevertheless, we choose to analyze MLE due to its simplicity and it turns out to be asymptotically 
optimal for weak recovery as well. 

Our results require mild regularity conditions on the size of the hidden community K and on 
the pair of distributions, P and Q. Specifically, for K, it is assumed without further comment that 

limsupiL/re < 1. 

n^oo 


This assumption implies that \o^°n-K) Ij so in several asymptotic results logn and log(n — K) 
are interchangeable; we give preference to logn. Also, to avoid triviality, it is assumed throughout 
that K >2. 


To state the assumption on P and Q we introduce some standard notation associated with 
binary hypothesis testing based on independent samples. Throughout the paper we assume the 
KL divergences D(P\\Q) and D{Q\\P) are finite. In particular, P and Q are mutually absolutely 


continuous, and the likelihood ratio, ^ 


dP 

dQ 


= Ep 


dP\-l 


( — ) 
^dQ) 


= 1 . 


Let L = log g 


satisfies Eg 

denote the LLR. The likelihood ratio test for n observations and threshold n6 is to declare P to be 
the true distribution if > n9 and to declare Q otherwise. Eor 6 G [—D{Q\\P),D{P\\Q)], 

the standard Chernoff bounds for error probability of this likelihood ratio test are given by: 


Q 


p 


'^Lk>ne 

.k=l 

n 

Lfc < n6* 


Lfc=i 


< exp(-nEQ(6»)) 

< exp(—nEp(0)), 


( 3 ) 

( 4 ) 


where the log moment generating functions of L are denoted by V’q(^) = logEQ[exp(AL)] and 
ipp{X) = logEp[exp(AL)] = '0 q(A + 1) and the large deviations exponents are give by Legendre 
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transforms of the log moment generating functions: 

EQ{e) = rqiO) ^ sup A0 - V’q(A), Ep{e) = ^>^(0) 4 supA0 - V^p(A) = - 6. (5) 

AeK AeK 

In particular, Ep and Eq are convex functions. Moreover, since = ~E{Q\\P) and '!/’q(1) = 

D{P\\Q), we have Eq{-D{Q\\P)) = Ep{D{P\\Q)) = 0 and hence Eq{D{P\\Q)) = D{P\\Q) and 
Ep{—D{Q\\P)) = D{Q\\P). Our regularity assumption on the pair P and Q is the following. 

Assumption 1. There exists a constant C such that for all n, 

< Cmm{D{P\\Q), D{Q\\P)}, VA G [-1,1]. (6) 

In general, iPq{X) = tppi^ — 1) = varQ^(L), where Qx is the tilted distribution defined by 
dQx = exp(AL—'(/’Q(A))(i(5, so the point of Assumption 1 is to require these quantities for A G [—1,1] 
be bounded by a constant times the divergences. Assumption 1 is the strongest condition imposed 
on P and Q in this paper; several of the results hold under weaker assumptions described in 
Section 3, which are also weaker than sub-Gaussianity of the LLR. 

Assumption 1 is fulfilled in the following cases: 

1. Bounded LLR: Lemma I in Section 3 shows that Assumption 1 holds if L is bounded by a 
constant, which, in particular, holds in the Bernoulli case if both 2 and | are bounded away 
from zero and infinity. 

2. Gaussian case: In the Gaussian case P = Af(^, 1),Q = Af(0,1), we have L{x) = fi{x — ^), 

D{P\\Q) = D{Q\\P) = mV2, V’q(A) = EQ{e) = i(;u + f )2 and Ep{e) = Eq^-O). 

In particular, V’q(A) = so Assumption 1 holds with C = 2 regardless of how ^ varies with 
n. More generally, for P and Q lying in the same exponential family, Appendix B provides a 
simple sufficient condition to verify Assumption 1. 

2.2 Weak Recovery 

The following theorem is our main result about weak recovery. It gives a sufficient condition and 
a matching necessary condition for weak recovery. 

Theorem 1. Suppose Assumption 1 holds. If 

K-D{P\\Q)^oo and liminf ~ > 2, (7) 

n^QO log 

then 

P{|CmlAG*| < 2A:e} > 1 - 

where e = l/x/KD{P\\Q). 

If there exists ^ .such that E[dp(^,^)] = o{K), then 

K-D{P\\Q)^oo and liminf> 2. (8) 

n^oo log 

Remark 1. The assumption K > 2, implies iL/2 < K — 1 < K, so the first parts of (7) and (8) 
would have the same meaning if K were replaced by A — 1. In the special case of bounded LLR, 
the factor A — 1 in the second parts of (7) and (8) can be replaced by K. This is because if log ^ 
is bounded, so is D{P\\Q), and KD{P\\Q) oo implies iL —oo and hence also {K — l)/K —>• 1. 
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Corollary 1 (Weak recovery in Bernoulli case). Suppose the ratios log | and log | are bounded. If 

K ■ d{p\\q) ^ oo and liminf > 2 , (9) 

n^oo log 

then weak recovery is possible. If weak recovery is possible, then 

K ■ d{p\\q) ^ 00 and liminf > 2 . (10) 

n^oo log 

Remark 2. Condition (10) is necessary even ifp/q — 00 , but (9) alone is not sufficient without the 
assumption that p/q is bounded. This can be seen by considering the extreme case where K = n/2, 
p = 1/n, and q = e“”'. In this case, condition (9) is clearly satisfied; however, the subgraph induced 
by index in the cluster is an Erdos-Renyi random graph with edge probability 1/n which contains 
at least a constant fraction of isolated vertices with probability converging to one as n —)• 00 . It 
is not possible to correctly determine whether the isolated vertices are in the cluster, hence the 
impossibility of weak recovery. 

Corollary 2 (Weak recovery in Gaussian case). If 

KpL^ 00 and liminf > 4, (11) 

n^oo log ^ 

then weak recovery is possible. If weak recovery is possible, then 

Kjjf ^ 00 and liminf ——> 4. (12) 

n^oo log f 


2.3 Exact Recovery 

The following theorem states our main result about exact recovery. It gives a sufficient condition 
and a matching necessary condition for exact recovery. Since exact recovery implies weak recovery, 
conditions from Theorem 1 naturally enter. 


Theorem 2. Suppose Assumption 1 holds. If (7) and the following hold: 


lim inf 

n^oo 


log n 


> 1 . 


(13) 


then the maximum likelihood estimator satisfies PICml = C*} —>■ 1. 

If there exists an estimator C such that PjC = C*} 1, then (8) and the following hold: 


lim inf 

n^oo 


log n 


> 1 . 


(14) 


Remark 3. In the special case of linear community size, i.e., K = 0(n), (13) and (14) can be 
simplified by replacing Eq (-^ log -^) by the Chernoff index between P and Q [13]: 

Ep{0)=Eq{0)= sup -log/ dQ^C{P,Q). (15) 

0<A<1 J / 

To see this, note that in the definition Eq{6) in (5) the supremum can be restricted to A G [0,1] 
and hence Eq{6) < Eq{6 -|- (5) < Eq(6) -|- as long as —D{Q\\P) < 6 < 6 + 6 < D{P\\Q). 
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By (7), 6 = log< D{P\\Q) for all sufficiently large n. Hence, in the case of K = Q{n), 
C{P, Q) < Eq (-^ log -^) < C{P, Q) + 0(^), proving the claim. The Chernoff index C(P, Q) gives 
the optimal exponent for decay of sum of error probabilities for the binary hypothesis testing 
problem in the large-sample limit. 

Corollary 3 (Exact recovery in Bernoulli case). Suppose log | and log | are hounded. If (9) holds, 
and 


where 


n^oo log n 


> 1, 


log| + ilog^ 


r = 


K 
6 qp 


(16) 


(17) 


then exact recovery is possible. If exact recovery is possible, then (10) holds, and 


lim inf 

n—¥oo 


Kd{T*\\q) 
log n 


> 1 . 


(18) 


Proof. In the Bernoulli case, Ep{6) = d{a\\p) and Eq{9) = d{a\\q), where a = (0-|-log |)/log p. □ 
Remark 4. Consider the Bernoulli case in the regime 


K = 


pn 


log' 


S —1 ’ 

n 


p = 


a log® n 


n 


q = 


b log® n 


n 


where s > 1 is fixed, p E (0,1) and a > 6 > 0. Let I{x,y) = x — ylog{ex/y) for x,y > 0. Then the 
sharp recovery thresholds are determined by Corollaries 1 and 3 as follows: For any e > 0, 

• For s > 1, if pl{b,a) > weak recovery is possible; if pl{b,a) < 

( 2 -e)(g-i)jog logn ^ then weak recovery is impossible. For s = 1, weak recovery is possible 
if and only if pl{b,a) = a;(i^)- 

• Assume p, a, b are fixed constants. Let tq = {a — b)/ log{a/b). Then exact recovery is possible 
if pI{b,TQ) > 1; conversely, if pI{b,To) < 1, then exact recovery is impossible, generalizing 
the previous results of [20, 2] for linear community size {s = 1). To see this, note that by 
definition, r* = (1 -|-o(1))to log® n/n, and thus d{T*\\q) = (1-|-o(l))/(5, tq) log® n/n. 


Remark 5. The recent work [28] considered a generalized planted bisection model where Aij ~ 
P if i,j are in the same community and Q if otherwise. Their result applies to the following 
generalization of the Bernoulli distribution, where P = {pQ,... ,pm,) and Q = {qo,... ,qm) with 
Pi = ^ 1 Qi = ,1 < i < m for some m > 1 and positive constants ai,bi, 1 < i < m. For 

this family of distribution the LLR is bounded and hence Theorem 2 gives the sharp condition 
for recovering a single hidden community. Specifically, note that 'f’qiX) = ~ ~ 

biX P o(l))^2^_ Thus for K = pn with a fixed p, the sharp threshold of exact recovery is given 
by /Osupo<;^<]^ (~ > 1- For m = 1 with ai = o and 6i = 5, the optimal A is 

determined by a^b^ = [a — b)/ \og{a/b) = tq, and the sharp threshold of exact recovery simplifies 
to pI{b,To) > 1, recovering the result for the Bernoulli case given in Remark 4. 












Corollary 4 (Exact recovery in Gaussian case). If (11) holds and 


liminf —^ ^ > 1, (19) 

+ y/n^gK) 

then exact recovery is possible. If exact recovery is possible, then (12) holds and 

Ku? 

liminf —^ ^ > 1- (20) 

[\/2 log n + V21og K) 

See Appendix C for a proof of Corollary 4. 


Remark 6. Consider the Gaussian case in the regime 

pn 2 _ hi log* n 

A ; p , 

log n n 

where s > 1 and p G (0,1) are hxed constants. The critical signal strength that allows weak or 
exact recovery is determined by Corollaries 2 and 4 as follows: For any e > 0, 

• For s > 1, if ^0 > (2 + e) -^ 1*~^ , then weak recovery is possible; conversely, if //q < 

(2 — then weak recovery is impossible. For s = 1, weak recovery is possible 

if and only if //q = 

• If /To > then exact recovery is possible; conversely, If po < then exact recovery 

is impossible. 

Remark 7. Butucea et al. [9] considers the submatrix localization model with an n x m subma¬ 
trix with an elevated mean in an Ai x M large Gaussian random matrix with independent entries, 
and gives sufficient conditions and necessary conditions, matching up to constant factors, for exact 
recovery, which are analogous to those of Corollary 4. Setting (n, m, N, M) in [9, (2.3)] (sufficient 
condition for exact recovery of rectangular submatrix) equal to {K, K, n, n) gives precisely the 
sufficient condition of Corollary 4 for exact recovery of a principal submatrix of size K from sym¬ 
metric noise. This coincidence can be understood as follows. The nonsymmetric observations of [9, 
(2.3)] in the case of parameters {K, K, n, n) yield twice the available information as the symmetric 
observation matrix we consider (diagonal observations excluded) while the amount of information 
required to specify a K x K (not necessarily principal) submatrix of an n x n matrix is twice 
the information needed to specify a principal one. The proof techniques of [9] are similar to ours, 
with the main difference being that we simultaneously investigate conditions for weak and exact 
recovery. Finally, the information limits of weak recovery for biclustering are established in [24, 
Section 4.1] based on modifications of the arguments in [9]. 

Remark 8. If AT < (11) implies (19), and thus (11) alone is sufficient for exact recovery; if 

K > then (19) implies (11), and (19) alone is sufficient for exact recovery. 

The reminder of the paper is organized as follows. Section 3 gives some preliminaries. Section 4 
proves Theorem 1, pertaining to weak recovery, and Section 5 proves Theorem 2, pertaining to exact 
recovery. Additional results are introduced in Section 5, which highlight alternative sufficient and 
necessary conditions for exact recovery involving large deviation probabilities for sums of random 
variables, related to the voting procedure mentioned in the introduction. 
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3 On the Assumptions on P and Q 

This section presents some conditions sufficient for Assumption 1, and some implications of As¬ 
sumption 1. 

Lemma 1 (Bounded LLR). If \L\ < B for some positive constant B, then Assumption 1 holds 
with C = 2e^^. 

Proof. First, some background. Let </>(?/) = — 1 — y, which is nonnegative, convex, with (f){0) = 

(j)'{0) = 0 and (fiy) = e^. Thus for \y\ < B, e~^ < (f)''{y) < and hence ^ — 4>iy) < 

Now to the proof. We begin by noticing that for all A G [—1,1], 

i-oW = < e“EQ[it. 

In turn, using < 2e^cj){y) as shown above and recalling that L = log we have 

Eq[l2] < 2e^^Q[4>{L)] = 2e^D{Q\\P). 

Combining the last two displayed equations yields PqP) < 2p^D{Q\\P) for A G [—1,1]. Abbrevi¬ 
ate by if. By a variation of the argument above, we have 

iij' rr2pALl 

<(A) = varQ,(L) < Eq^[P] = ^ if ^ G [0,2], 

so that if"{X) < 2e^^D{Q\\P) for A G [0,2]. Let if denote the version of if that would be obtained 
if the roles of P and Q were swapped. Then if”{\)^ 2e^^D{P\\Q) for A G [0,2]. Since if and 
if are related by reflection about A = 1/2: if{X) = if{l — A), we have if''{X) < 2e^^D{P\\Q) for 
A G [—1,1], completing the proof. □ 

As shown in the proofs, Theorem 1 (weak recovery), and the sufficiency part of Theorem 2 
(exact recovery) hold under assumptions somewhat weaker than Assumption 1; only the necessity 
part of Theorem 2 relies on Assumption 1. To clarify this subtlety, we introduce two successively 
weaker assumptions. We also provide a lemma showing that any of the assumptions imply the 
equivalence D{P\\Q) x D{Q\\P) x C{P,Q). 

Assumption 2. For some constant C: 


ifp{X)-D{P\\Q)X< 

Ag [-1,0] 

(21) 

ifQ{X)+D{Q\\P)X< 

A G [—1,1] 

(22) 


Remark 9. Assumption 2 is weaker than the assumption that L is sub-Gaussian with scale pa¬ 
rameter D(P\\Q) under P and with scale parameter D{Q\\P) under Q. A sub-Gaussian assumption 
would correspond to requiring (21) and (22) to hold for all A G M. 

Assumption 3. For some constant C: 

Ep{{l - v)D{P\\Q)) > ^D{P\\Q), rj G [0,1] (23) 

2 

EQi-il - v)D{Q\\P)) > ^D{Q\\P), y G [0,1]. (24) 
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Lemma 2. Assumption 1 implies Assumption 2 which implies Assumption 3, with the same con¬ 
stant C throughout. Any of these assumptions implies that: 

min{Z)(P||g), D{Q\\P)] > C{P, Q) > ^ ma^{D{P\\Q), D{Q\\P)}, (25) 

and hence also that D{P\\Q) x D{Q\\P) x C{P,Q). 

Proof. Assumption 1 ^ Assumption 2: Condition (21) is implied by Assumption 1 because ifp^O) = 
0, and if'piO) = D{P\\Q), so by the integral form of Taylor’s theorem, ifp{X) — D{P\\Q)X is A^/2 
times a weighted average of ifp over the interval [A, 0] for A G [—1,0]. Similarly, (22) is implied 
by Assumption 1 because ^/’q(A) + D{Q\\P)X is a weighted average of ifq over the interval with 
endpoints 0 and A, for A G [—1, Ij. 

Assumption 2 ^ Assumption 3: Since 'ipp{—l) = iPq{1) = 0, either (21) or (22) imply that 
C >2, which is achieved in the Gaussian case. Condition (21) implies 

Epiil - rj)D{P\\Q)) = sup(A(l - v)DiP\\Q) - ^Pp{X)) 

Asm 

> D{P\\Q) sup (-Ar? - = ^D{P\\Q), 


where the supremum is attained at A = ^ which belongs to [—1,0] by the fact C > 2. So (21) 
implies (23). The proof that (22) implies (24) is similar. 

Assumption 3 ^ (25): Taking r/= 1 in (23) and (24) we get C(P, Q) > ^max{D{P\\Q),D{Q\\P)}. 
In the other direction, D{P\\Q) = Eq{D{P\\Q)) > Eq{0) = C{P,Q) and, similarly, D{Q\\P) > 
C{P,Q). □ 

Recall the Chernoff upper bounds (3) and (4) on the probability of large deviations, which 
hold non-asymptotically for any sample size n and any pair P and Q. To prove the necessary 
condition for exact recovery, we need a lower bound with matching exponent. Such a result is 
well-known for fixed distributions. Indeed, the sharp asymptotics of large deviation is given by the 
Bahadur-Rao theorem (see, e.g., [17, Theorem 3.7.4]); however, this result is not applicable in the 
hidden community problem because both P and Q can vary with n. The following lemma provides 
a non-asymptotic information-theoretic lower bound (cf. [36, Theorem 11.1] and [15, Eq. (5.21), 
p. 167]): 

Lemma 3. If —D{Q\\P) < 'y < j + 6 < D{P\\Q), then 


n 


exp {-nEqi'y)) > Q 


Lfc > ^7 

.k=l 


> exp 


nEQ{j-\-6)-\-log2 \ 

l-7FSuPo<A<iV’Q(A)y 


(26) 


Proof. The left inequality in (26) is the Chernoff bound (3); it remains to prove the right inequality. 
LetEn = {Ek=iLk > ny}. For any Q', the data processing inequality of KL divergence gives 

d{Q'[En]\\Q[En]) < DiQ'^Q^) = nDiQ'WQ). 

Using the lower bound for the binary divergence d{p\\q) = —h{p) + p\ogX -|- (1 — p) log > 
— log 2 + p log I yields 

d{Q'[En]\\Q[En]) > -log2 + g'[E,]log-^, 
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so that 


^riT^ 1 ^ f-nD{Q’\\Q)-log2 

- “p I— mm — 


For A G [0,1], the tilted distribution Qx is given by dQx = EQ^ixp(A^^ ] = Jp§qT^- Then for any 
a G [—D{Q\\P), D{P\\Q)], there exits a unique A G [0,1], such that Eq^[L] = a and EQ{a) = 
— T*(Qa||Q)- Choosing a = 7 + <5 and Q' = Qx, we have 


1 - Qa [En] 


Q\ 


Lfc < ny 

.k=l 


Qx 


Y,[Lk-EQ,[Lk]) < -n6 

.k=l 


< va''Q^(Ti) 

“ n6'^ 


nS^ 


Consequently, 


Q 


^ Lfc > ny 
.k=l 


> exp 


nEqOi + <5) + log 2 


1 - 


nS'^ 


□ 


Corollary 5. If Assumption 1 holds and —D{Q\\P) < y < y + (5 < D{P\\Q): 


exp {-nEgi^)) > Q 


'^Lk> nj 
.k=l 


> exp 



ElQ(y + 5) + log 2 
Cmin{D(P||Q),D(Q||P)| 
nS^ 


4 Weak Recovery for General P/Q Model 

Theorem 1 is proved in Section 4.1. Section 4.2 provides a modification of the sufficiency part of 
Theorem 1 giving a sufficient condition for weak recovery with random cluster size; it is used in 
Section 5 to prove sufficient conditions for exact recovery. 


4.1 Proof of Theorem 1 

Remark 10. The sufficiency proof only uses (23) while the necessity proof only uses (24). The 
sufficiency proof is based on analyzing the MLE via a delicate application of union bound and large 
deviation upper bounds (3) and (4). For the necessary part, the proof for the first condition in (8) 
uses a genie argument and the theory of binary hypothesis testing, while the proof of the second 
condition in ( 8 ) is based on mutual information and rate-distortion function. 

Sufficiency We let C denote the MLE, CmLj for brevity in the proof. Let L = |C H (7*1 and 
e = l/x/KD{P\\Q). Since K > 2 and {K — 1)D{P\\Q) — 00 by assumption, we have e = o(l). 
Since \C\ = |C*| = K and hence |(7A(7*| = 2{K — L), it suffices to show that P{L < (1 — e)iL} < 
exp(—n(iL/e)). 

Note that 

e(C, d) - e{C\C*) = e{C\C*,d\C*) + e(C\(7*, C n C*) - e{C*\d, C*). (27) 
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and |C'*\C'| = |C'\C*| = K — L. Fix 9 G [—D{Q\\P), D{P\\Q)] whose value will be chosen later. 
Then for any 0 < £ < K — I, 


{L = £}c {3Cc [n] : \C\ =K,\CnC*\ =£,e{C,C) > e{C\C*)} 

= {3S CC*,T C iC*y : |5| = \T\ = K - £, e{S, C*) < e{T, T) + e(r, 

C {3S C C* : \S\ = K - £, e{S, C*) < mO] 

U{35 cC*,Tc {C*y : |5| = |r| = K - £,e{T,T) + e{T,C*\S) > m0}, 


where m = (^) — ( 2 ). Notice that e{S,C*) has the same distribution as YllLi under measure 
P; e(T,T) + e{T,C*\S) has the same distribution as YllLi under measure Q where Lj are i.i.d. 
copies of log Hence, by the union bound and the large deviation bounds (3) and (4), 


F{L = £} < 


K 
K - 


Li < mO 


_i=l 


-7-'; 


Li > m9 


_2 = 1 


- ^ exp(-mPp(0)) + ^ exp{-mEQ{9)) 

^ 'exp(-mPp(0)) + exp(-mPQ(0)) 


where the last inequality holds due to the fact that (^) < (ea/6)^. Notice that m = {K — £){K + 
£-l)/2> {K -£){K -l)/2. Thus, for any £ < (1 - e)iF, 

P{L = ^} < + f.-{K-e)E 2 ^ ( 28 ) 


where 

El 4 (iF-l)Pp(0)/2-log^, 

P 2 = (P - i)EQ{e)/2 - log 

By the assumption (7), we have {K — l)i4(P||Q)(l — rj) > 21og-^ for some rj G (0,1). Choose 
9 = {1 — r])D{P\\Q). By the assumption (23), we have 

Pi >C7y2(iF-l)P(P||Q)/2-log^. 

Using the fact that Ep{9) = Eq{9) — 9, we have 

E 2 > cr^\K - l)D{P\\Q)/2 - 21og - + i^^P(P||Q)(l - r?) - log 

el 

> cn\K - l)D{P\\Q)/2 - 2log 

Therefore, in view of e = 1/^KD{P'^Q), it follows that E = min{Pi,P 2 } = ^{KD{P\\Q)) = 
fl(e“^). Hence, in view of (28), 


(l-e)iC 00 

P{L< (l-e)P:}= F{L = £} < Y 


< 


e=o 

2 exp(—eKE) 
1 — exp(—P) 


e=eK 

= exp(—ll(Pr/e)). 
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Necessity Given E [n], let denote Consider the following binary hypothesis 

testing problem for determining If = 0, a node J is randomly and uniformly chosen from 
{j: = 1}, and we observe {A, J, if = 1, a node J is randomly and uniformly chosen from 

{j: = 0}, and we observe {A, J, Note that 

= 1} = 1, J} = 1, fce[n]\{y}:4=i nArk)Q{Ajuy 

where the first equality holds because P {= 0} = P {J|Ci = 1}; the second equality holds because 

P{^\j j|^j = 0, j} = P{^\i,j|^j = 1,^}- Let T denote the vector consisting of Aik and Ajk for all 

k E [re]\{z, J} such that ^k = 1- Then T is a sufficient statistic of {A,J,^\i j) for testing = 1 

and ^i = 0. Note that if = 0, T is distributed as if = 1, T is distributed 

as , Thus, equivalently, we are testing versus 

let £ denote the optimal average probability of testing error. Then we have the following chain of 

inequalities: 

n n 

Hdnit 0] > ^ - X] ^ 

i=l 

= min P[^i / ^i] = n£. (29) 

By the assumption ^)] = o{K), it follows that £ = o{K/n). Since K/n is bounded away 

from one, this implies that the sum of Type-I and II probabilities of error pe,o +Pe,i = o(l)) which 
is equivalent to TV((P (g) {Q ® ^ where TV(P, Q) = / |dP — dQ|/2 denotes 

the total variation distance. Using D{P\\Q) > log 2 (i-tv(pq)) tensorization 

property of KL divergence for product distributions, we have {K — 1){D{P\\Q) + D{Q\\P)) —>■ oo. 

By the assumption (24) and the fact that Eq{9) is non-decreasing in 0 E [—D{Q\\P),D{P\\Q)\, it 
follows that 

D{P\\Q) = Eq{D{P\\Q)) > Eq{-D{Q\\P)/2) > -^D{Q\\P). 

Hence, we have {K — 1)D{P\\Q) —>■ oo, which implies KD{P\\Q) —)• oo. 

Next we show the second condition in (8) is necessary. Let H{X) denote the entropy function 
of a discrete random variable X and I{X;Y) denote the mutual information between random 
variables X and Y. Let ^ = (^i,... ,^n) be uniformly drawn from the set {x E {0,1}” : w{x) = K} 
where w{x) = denotes the Hamming weight; therefore ^j’s are individually Bern(iL/n). Let 
F[d_H'(^, 0] = ^nK, where —)■ 0 by assumption. Consider the following chain of inequalities, 
which lower bounds the amount of information required for a distortion level ep- 

L(^;0>L(^;0> min /(f;0>LI(0- max iL(C®0 
nd{i,i)]<enK nd{i,^)]<e„K 

(b) , f 'n\ (enK\ D n 

“ [kJ ~ ^ ^ ’ 

where (a) follows from the data processing inequality, {b) is due to the fact that‘s max]E[^(x)]<pn E{^) = 
nh{p) for any p < lj2 where h{p) = plog ^ + (1 — p) log is the binary entropy function, and 

^To see this, simply note that H{X) < Y£i=i H{Xi) < nh{'^W‘ {Xi = 1} /n) < nh{p) by Jensen’s inequality, which 
is attained with equality when Xi’s are iid Bern(p). 
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(c) follows from the bound > (^)^, the assumption Kjn is bounded away from one, and the 
bound /i(p) < —p\ogp + p for p G [0,1]. Moreover, 

I{A-,C)= mmZ)(P^|^||Q|P^) 

< D(p^i5iiQ®(;)|P5) 

= (^^D{P\\Q). (30) 

Combining the last two displays, we get that liminf^^oo 

Remark 11. The hidden community model (Definition 1) adopted in this paper assumes the data 
matrix A has zero diagonal, meaning that we observe no self information about the individual 
vertices - only pairwise information. A different assumption used in the literature for the Gaussian 
submatrix localization problem is that An has distribution P if z G C* and distribution Q otherwise. 
Theorem 1 holds for that case with the modification that the factors K—1 in (7) and (8) are replaced 
hy K + 1. We explain briefly why the modified theorem is true. The proof for the sufficient part 
goes through with the definition of e{S,T) in (1) modified to include diagonal terms indexed by 
5nT: e{S,T) = Yl{i<j)-{i j)e{SxT)u{TxS) Then m increases by A —resulting in A — 1 replaced 
by A + 1 in Ai and £ 2 - As for the necessary conditions, the proof of the first part of (8) goes 
through with the sufficient statistic T extended to include two more variables, An and Ajj, which 
has the effect of increasing A by one, so the first part of (8) holds with A replaced by A + 1, but 
the first part of (8) has the same meaning whether or not A is replaced by A + 1. The proof of 
the second part of (8) goes through with (^) replaced by 1 + • • • + A = in (30), which has 

the effect of changing A — 1 to A + 1 in the second part of (8). The necessary conditions and the 
sufficient conditions for exact recovery stated in the next section hold without modification for the 
model with diagonal elements. In the proof of Lemma 6, the term e(i,C*) in the definition of F, 
(40), should include the term Ln and the random variable Xi in the proof that P{Ai} —>• 0 should 
be changed to Xi = e{i, {1, • • • , i}), and also include the term Ln- 

4.2 A Sufficient Condition For Weak Recovery With Random Cluster Size 

Theorem 1 invokes the assumption that | C* | = A and A is known. In the proof of exact recovery, 
as we will see, we need to deal with the case where \C*\ is random and unknown. For that reason, 
the following lemma gives a sufficient condition for weak recovery with a random cluster size. We 
shall continue to use Cml to denote the estimator defined by (2), although in this context it is not 
actually the MLE because |C*| need not be A. That is, there is a (slight) mismatch between the 
problem the estimator was designed for and the problem it is applied to. 

Lemma 4 (Sufficient condition for weak recovery with random cluster size). Assume that A —>• 00 , 
limsupA/n < 1, and there exists a universal constant C > 0 such that (23) holds. Furthermore, 
suppose that 


P{||C* 


If (7) holds, then 


p{|CmlAC* 
where e = l/y^min{log A, KD{P\\Q)}. 


K\ < A/log A} > 1 
< 2Ae + 3A/logA| 


- 0 ( 1 ). 


>1-0(1), 
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Proof. By assumption, with probability converging to 1, IIC*! — K\ < K/logK. In the following, 
we assume that \C*\ = K' for \K' — K\ < K/\ogK. Let L = [Cml H C*\. Then ICmlAC*! = 
K + K' — 2L. To prove the theorem, it suffices to show that P{L < (1 — e)K — \K' — iL|} = o(l), 
where e is defined in the statement of the theorem. Following the proof of Theorem 1 in the fixed 
cluster size case, we get that for all 0 < £ < iF — 1, 


{L = 4 C C [n] : \C\ = K,\Cf^C*\ =l,e{C,C) > e{C*,C*)} 

= {35 CC*,TC {C*y :\S\= K' -£,\T\= K -£, e{S, C*) < e(T, T) + e(r, C*\5)} 
C (35 C C* : |5| =K' -£, e{S, C*) < mO} 

U{3S CC*,T C {C*f: |5| = K' - £,\T\ = K - £,e{T,T) + e{T,C*\S) >m9}, 


where 9 G [—D{Q\\P),D{P\\Q)] is chosen later. Notice that e{S,C*) has the same distribution 
as under measure P; e(T,T) + e{T,C*\S) has the same distribution as X]™i Lj under 

measure Q where ur' = (^ ) — ( 2 ), m = (^) — ( 2 ), and Li are i.i.d. copies of log Hence, by the 
union bound and large deviation bounds in (3) and (4), 


>{L = n< 


< 


K’ 

K' -£ 

K'e 
K' -£ 


P 


Li < m9 


i=l 
K'-£ 


+ 


n-K' 


K - £ \K' - £ 


K' 


Q 


—m'Ep{m6/m')) , 

Vi “r 


[n - K')e 
K-£ 


K-t 


Li > m9 
K'e 


K' -£ 


K'-l 


-mEgie) 


Notice that for any £ < {I — e)K — \K — K'\, K' — £ > emax{iF', K}, K — £ > eK, and 

K ^K-£^ K-£ ^ K-{l-e)K 

K + K/ log K - K' -£ - K - K/\ogK - £ “ K - K / \ogK - {I - e)K' 

Since e > 1 /Vlog K and K —)• 00 , it follows that {K — £)f{K' — £) = ! + o(l). Also, 

m' = {K' -£){K' + £-l)/2 > {K' - £){K' - l)/2 
m = {K -£){K+ £-l)/2 > {K - £){K - l)/2, 


Therefore, m/m! —>■ 1, and, moreover, 

^{L = £] < Q-K-m+o{l))Ei ^ ^-(K-£){1 +o{1))E2 ^ 


with 

El = KEp{m9/m')/2 - log -, 

(ti — 

E^ = KEQ{9)/2-\og ^ . 

By the assumption (7), we have KD{P\\Q){1 — rj) > 2log for some r] G (0,1). Choose 9 = {1 — 
r])D{P\\Q). By (23), we have that Pp(0) > cr]‘^KD{P\\Q) and Ep{m9/m') > (l+o(l))c?7^iFZ)(P||(5). 
Thus, 

Pi > (1 + o{l))cr]'^KD{P\\Q)/2 - log -. 

e 

Using the fact that Ep{9) = Eq{9) — 9, we get that 

E 2 > crj^KD{P\\Q)/2 - 2log - + :^P(P||Q)(1 - - log > cKi^^D{P\\Q))/2 - 2log -. 

el A e 
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Since KD{P\\Q) —>■ oo by assumption e > 1/y^ KD{P\\Q), it follows that E = min{£'i,£^ 2 } = 
n{KD{P\\Q)). Therefore,'^ 

(l-e)K 

P{L<{l-e)K-\K'-K\] < ^ ^-{K-i)(i+o{i))E 2 '\j 

CXD 

< 2 ^ e-(i+o(i)KE = exp(-f2(v^£3£)(P||g))) = 

e=€K 

as was to be proved. □ 

5 Exact Recovery for General P/Q Model 

The sufficiency and necessity halves of Theorem 2 are proved in Sections 5.1 and 5.2, respectively. 

5.1 The Sufficient Condition and the Voting Procedure 

This section proves the sufficiency part of Theorem 2. The proof is based on a two-step procedure 
for exact recovery, described as Algorithm 1. The hrst main step of the algorithm (approximate 
recovery) uses an estimator capable of weak recovery, even with a slight mismatch between 1(7*1 
and K, such as provided by the ML estimator (see Lemma 4). The second main step cleans up the 
residual errors through a local voting procedure for each index. In order to make sure the first and 
second step are independent of each other, we use the method of successive withholding. 

This method of proof highlights (13) as the sufficient condition for when the local voting pro¬ 
cedure succeeds. In fact, it permits us to prove an intermediate result. Theorem 3 below, which 
can be used to show that weak recovery plus cleanup in linear additional time can be applied to 
yield exact recovery no matter how the weak recovery step is achieved. In particular, [22] and [24] 
give conditions for message passing algorithms to achieve weak recovery in (near linear) polynomial 
time, and they invoke Theorem 3 to note that, if (13) holds, exact recovery can be achieved with 
the addition of the linear time cleanup step. 

Algorithm 1 Weak recovery plus cleanup for exact recovery 

1: Input: n G N, A > 0, distributions P, Q; observed matrix A; <5 G (0,1) with 1/5, n5 G N. 

2: (Partition): Partition [n] into 1/5 subsets of size n5. 

3: (Approximate Recovery) For each k = 1,... ,1/5, let Ak denote the restriction of A to the 
rows and columns with index in [n]\Sfc, run an estimator capable of weak recovery with input 
(n(l — 5), \K{1 — 5)], P, Q, Ak) and let Ck denote the output. 

4: (Cleanup) For each k = 1,... ,1/5 compute r* = ^ij i & Sk and return C, the set 

of K indices in [n] with the largest values of r^. 


The following theorem gives sufficient conditions under which the two-step procedure achieves 
exact recovery, assuming the first step provides weak recovery. 

Theorem 3. Suppose C is produced by Algorithm 1 using estimators for weak recovery Ck such 
that, 

P {|CfcA(7fcI < 5A/or 1 < A: < l/jj ^ 1, (31) 

®The 0 ( 1 ) terms converge to zero as 1 and ^ 1, uniformly in £ for 0 < C < (1 — e)K — \K — K'\. 
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as n ^ oo, where = C* H ([n]\S'fc). Suppose also that Assumption 1 holds (or the weaker 
conditions (22) and (23) hold), (13) holds. Then PjC = C*} —>-1 as n ^ oo. 

The proof of Theorem 3 is given after the following lemma. 

Lemma 5. Suppose Assumption 1 holds (or the weaker condition (22) holds) and (13) holds. Let 
{Xi} denote a sequence of i.i.d. copies o/log ^ under measure P. Let {Yi} denote another sequence 
of i.i.d copies o/log ^ under measure Q, which is independent of {Xi}. Then for 6 sufficiently 
small and 7 = -^ log 


K{l-25) K5 

Xi + Y,Yi<K{l-6)j\ = o{l/K) 

i=l i=\ 

( K{l-&) 

E Yi>K{l-5h) = o{l/{n-K)). 


(32) 


(33) 


Proof. By the assumption (13), there exists e > 0 sufficiently small such that KEQ{'y) > (1+e) log n 
for all sufficiently large n. We restrict attention to such n. First of all, 

'K{l-S) 1 

Yi> K{1 - <5)7 > < exp(-iF(l - 6)Eq{j)) < 

Then (33) holds as long as <5 < tt-- To show (32), for any t > 0, the Chernoff bound yields 


K{l-25) 


KS 


E + < iF(l - (5)7 > < exp {K{1 - 2S){'ipp{-t) + jt) + KS{ifQ{-t) + tj)). 

i=i i=i J 

Since Ep{'y) = sup_;^<;^<Q X'y—'ipp{X), choose t G [0,1] so that 'tpp{—t)+'yt = —Ep{^) = —£'q( 7 )+ 7 . 
Since A i—>■ '0 q(A) is convex with V'q(O) = V'q(I) = 0; d follows that 

V’Q(-t) < V’q(-I) < ^(QII^) (1 + c/ 2 ), (34) 

where the last inequality follows from (22) with A = —1. Note that (24) is implied by (22). It 
follows from (24) that Eqf^) > £’^(0) > ■^D{Q\\P). Together with (34), it yields that 'f>Q{—t) < 
C{C + 2)Eq{'^). Let C" = C'(C' + 2). Combining the above gives 


' K {1-25) 


K5 


Y, X, + YYi<K{l- <5)7 > < exp {-K{1 - 25)Epij) + K6C'Eq{j) + K5^) 


2 = 1 


2 = 1 


= exp {-K{1 - {C + 2)6)Ep{^) + K6{1 + C')7) 

< exp (—(1 — {C' + 2)(5)(log K + elogn) + <5(1 + C') logn) , 

where the last inequality follows from the assumption that KEp{'y) = logK — logn + KEQ{'y) > 
logK + elogn. Therefore, as long as (1 — (C" + 2)(5)(1 + e/2) > 1 and <5(1 + C") < (e/3)/(l + e/2), 


K5 


P 


K{l-25) 

E ^■ + E Yi ^ K{1 — (5)7 ^ < exp 
2 = 1 2=1 


l + e/2 


log K —- log n 
3 


so that (32) holds. 


□ 


“The o in o{l/K) is understood to hold as n —^ oo. Thus, if K is bounded, o{l/K) means o(l) as n —>■ oo. 
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Proof of Theorem 3. Note that the conditions of Lemma 5 are satisfied, so that (32) and (33) hold. 

Given {Cl,Ck), each of the random variables r* G Sk for i G [n] is conditionally the sum of 
independent random variables, each with either the distribution of Xi or the distribution of li 
described in Lemma 5. Furthermore, on the event, £k = {\Ck^Cf \ < 5K}, 

\Ck^Cl\ > \Ck\ - \CkXCl\ = \K{1 - 5)1 - \Ck/\Cl\ > K{1 - 25), 

One can check by definition and the change of measure that Xi is hrst-order stochastically greater 
than or equal to Yi. Therefore, on the event Ek, for i G C*, r* is stochastically greater than or equal 
to Xj + Yj. For i G [n]\C'*, has the same distribution as Yj. Hence, by 

(32) and (33) and the union bound, with probability converging to 1, ri > K{1 — 5)7 for all i & C* 
and Tj < K{1 — 5)7 for all i G [n]\C'*. Therefore, P{C = C*} — )• 1 as n — >• 00 . □ 

Proof of Sufficiency Part of Theorem 2. If K is bounded, exact recovery is the same as weak re¬ 
covery, so the sufficiency part of Theorem 2 follows from the sufficiency part of Theorem 1 in that 
case. So assume for the remainder of the proof that iL —?■ 00 . 

In view of Theorem 3 it suffices to verify (31) when Ck for each k is the MLE for based 
on observation of Af^, for 5 sufficiently small. The distribution of is obtained by sampling 
the indices of the original graph without replacement. Therefore, by a result of Hoeffding [25], the 
distribution of | | is convex order dominated by the distribution that would result by sampling with 

replacement, namely, by Binom (n(l — 5), ■^). That is, for any convex function 'I', E ['I'(|C'^|)] < 
E ['I'(Binom(n(l Therefore, Chernoff bounds for Binom(n(l — 5), ■^)) also hold for |G^|. 

The Chernoff bounds for X ~ Binom(n,p) give: 

P {X > (1 -h r])np} < V 0 < 7 < 1 (35) 

P{X < (1 — rj)np} < V 0 < 7 < 1. (36) 


Then, 


P 




< P 


Binom 


n(l - 5), — 
n 


<e-''(^/i°g'^) = o(l). 


(1 - 5)K 


> 


logiL / 


Since (7) holds and iL —>■ 00 , it follows that 


. J{l-5)K^D{P\\Q) ^ ^ 
hm mf-; - > 2 


log 


K 


for any sufficiently small 5 G (0,1) with 1/5, n5 G N. Hence, we can apply Lemma 4 with K 
replaced by [(1 — h)K~\ to get that for any 1 < A: < 1/5, 


pjlCfcAC^I < 2eiL + 3iL/logiL| > 1 - o(l), 


(37) 


where e = 1/sj minjlog iL, iLZl(P||Q)}. Since 5 is a fixed constant, by the union bound over all 
1 < /c < 1/5, we have that 


pjlCfcACfcl < 2eX + 3X/log X for l<k< 1/5} > 1 -o(l). 

Since e ^ 0, the desired (31) holds. □ 
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5.2 The Necessary Condition 

The following lemma gives a necessary condition for exact recovery under the general PjQ model 
expressed in terms of probabilities for certain large deviations. Later in the section the lemma is 
combined with the large deviations lower bound of Lemma 3 to establish the necessary conditions 
in Theorem 2. This method parallels the method used in the previous section for establishing the 
sufficient condition in Theorem 2. 


Lemma 6. Assume that iL —>■ oo and limsupiL/n < 1. Let Li denote i.i.d. eopies o/log^. If 

there exists an estimator C sueh that PjC = C*} —>■ 1, then for any Kq oo such that Kq = o{K), 
there exists a threshold On depending on n such that for all sufficiently large n, 


P 


K-K„ 


L,<{K- 1)0„ - {Kn - l)D{P\\Q) - 6u 


i=l 


Q 


K-l 


YLi>{K- 1 ) 0 . 


2=1 


VI 

(38) 

1 

-n-K' 

(39) 


where 


Ko'jarp (Li) and varp(Li) denotes the variance of Li under measure P. 


Proof. Since the planted cluster C* is uniformly distributed, the MLE minimizes the error proba¬ 
bility among all estimators. Thus, without loss of generality, we can assume the estimator used C is 
Cml and the indices are numbered so that C* = [K], Hence, by assumption, P jCML = C*! —)• 1. 
For each i G C* and j ^ C*, we have 


e {C*\{i} U {j}, C*\{f} U {j}) - e(C*, C*) = e{j, C*\{f}) - e(i, C*) 


Let io denote the random index such that zq = argmin^gc* e(i, C*). Let F denote the event that 


mine(i,C*) < maxe(j, C^Vlio}), (40) 

iec* j^c* 'Luju V ; 

which implies the existence of j ^ C*, such that the set C'*\{io} U {j} achieves a likelihood at least 
as large as that achieved by C*. Since if the event F happens, then with probability at least 1/2, 
ML estimator fails, it follows that ^P{F} < P{ML fails} = o(l). 

Set 9'^ to be 


0' = inf X E M : P 


K-Ko 


Y Li<{K- l)x - {Ko - l)D{P\\Q) - 6a 


. 2=1 


> 


Kn 


and On to be 


On = sup < X E M : (5 


K-l 


YL^>{K- l)x 


i=l 


> 


n-K 


Define the events 

El = { mme(i,C'*) < (iL- 1)0;|, E 2 = | ingxe(j, C^/lio}) > (iL- 1)0"|. 


We claim that P{Pi} = D(l) and P{£' 2 } = f^(l); the proof is deferred to the end. Note that the 
random index zq only depends on the the joint distribution of edges with both two endpoints in 
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C*. Thus e(j, C'*\{zo}) for different j ^ C* are independent and identically distributed, with the 
same distribution as under measure Q. Thus Ei and E 2 are independent, so in view of 

P{F} = o(l), 


'{Elf^E 2 f^E^} >F{EinE 2 } -F{E} = P{Ei}P{p; 2 } - o(l) = 


Since 


n ^2 n F" c {0; > 0"}, 

and 9n,0'^ are deterministic, it follows that 0^ > 0" for sufficiently large n. Set 9n = {9'^ + 9"^l2. 
Thus 9n < 9'n and by the definition of 0^, (38) holds. Similarly, we have that 9n > 9^ and by the 
definition of 9^, (39) holds. 

We are left to show P {Fi} = (1) and P{F 2 } = 0(1). We first prove that P{F 2 

Q Li> X is left-continuous in x, it follows that Q 

Therefore, 

P{F 2 } = 1- n r[e{j,C*)<{K-l)9:} 


= 0(1). Since 
> (n - K)-^. 


= 1 - \ i-Q 


K-l 


> 1 — exp —Q 


Y,Li>{K- 1)9. 

_ i=l 
K-l 

Y,Li>{K- 1)9. 


n—K 


_ 2 = 1 


{n — K) \ > 1 — e 


-1 


where the first equality holds because e(j,C*\{fo}) are independent for different j ^ C*; the 
second equality holds because e(j, C'*\{fo}) has the same distribution as under mea¬ 

sure Q; the third inequality is due to 1 — x < e~^ for x € M; the last inequality holds because 
Q Li > {K - 1 ) 9 '.^ >{n-K)~^. So P{F 2 } = 0(1) is proved. 

Next, we show that P{Fi} = 0(1). The proof is similar to the proof of P{F 2 } = 0(1) just 
given, but it is complicated by the fact the random variables e{i, C*) for i ^ C* are not independent. 


Since P Li < 


X 


is right-continuous in x, it follows from the definition that 


P 


K-Ko 


^ Li <{ K - 1)0; - {Ko - l)DiP\\Q) - 6a 


2 = 1 


> 


Ko' 


(41) 


For all i £ C*, e{i,C*) has the same distribution as Li under measure P, but they are not 

independent. Let T be the set of the first Ko indices in C*, i.e., T = [Ko], where Kg = o{K) and 
Ko 00 . Let = Frovarp(Li), where varp(Fi) denotes the variance of Li under measure P, and 
let r = {ieT-. e{i,T) < {Kg - 1)D{P\\Q) + 6a}. Since^ 


min e(i, C*) < min e(i, C*) < mine(i, C*\T) + (Kg — 1)F(P||Q) -|- 6a, 
iec* ieT' i&T' 


it follows that 


P{Fi} > P |mme(i,C*\T) < {K - 1 ) 9 '^ - {Kg - l)F(F||g) - 6ct| . 


^In case T' — 0 we adopt the convention that the minimum of an empty set of numbers is -foo. 
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We show next that P||r'| > —)• 1 as n —)• oo. For i ^ T, e{i,T) = Xi + Yi where Xi = 

e{i, {1,... , i — 1}) and YJ = e(z, {z + 1,..., Kg}). The X’s are mutually independent, and the T’s 
are also mutually independent, and Xi has the same distribution as X]}=i ^ same 

distribution as where Lj is distributed under measure P. Then E [Xj] = (i — 1)D{P\\Q) 

and var(Xj) < Thus, by the Chebyshev inequality, F{Xi > {i — 1)D{P\\Q) + 3(t} < ^ for 
all i ^ T. Therefore, \{i '■ Xi < {i — 1)D{P\\Q) + 3it}| is stochastically at least as large as 
a Binom (Xq, |) random variable, so that, P {|{i : Xj < (i — 1)D{P\\Q) + 3(t}| > —)• 1 as 

Ko —)• oo. Similarly, P {|{z : Ti < [Kq — i)D{P\\Q) + 3(t}| > —)• 1 as Kq —)• oo. If at least 3/4 

of the X’s are small and at least 3/4 of the T’s are small, it follows that at least 1/2 of the e(z, T)’s 
for i G T are small. Therefore, as claimed, P||r'| > —>■ 1 as Kq —)• oo. 

The set T' is independent of {e{i,C*\T) : i G T) and each of those variables has the same 
distribution as under measure P. Thus, 


P{Si} 

>1 -E 


>1 — exp I —P 
>1 — e“^ — o(l). 


n P {e(j, C*\T) >{K- 1)0; - (X„ - l)DiP\\Q) - 6 a} 

j&T' 

^K-Ko 

Y, L, <{K- 1)0; - {Ko - l)DiP\\Q) - 6a 


i=i 


17^'I > 
11-2 


Ko/2 - o(l) 


\r\ < — 

' ' 2 


where the last inequality follows from (41). Therefore, P{Fli} = n(l). 


□ 


Proof of Necessary Part of Theorem 2. Since the joint condition ( 8 ) is necessary for weak recovery, 
and hence also for exact recovery, it suffices to prove (14) under the assumption that ( 8 ) holds, i.e., 

KD{P\\Q) ^ oo, KD{P\\Q) > (2 - eo) log(n/X) (42) 


for any fixed constant cq G (0,1) and all sufficiently large n. It follows that 

<Eq{D{P\\Q)) = D{P\\Q). 

Thus if X = 0(1), then (42) implies (14). Hence, we assume X ^ oo in the following without loss 
of generality. 

For the sake of argument by contradiction, suppose that (14) does not hold. Then, by going to 
a subsequence, we can assume that 

limsup < 1 ^ ( 43 ) 

n^oo log 

where 7 = -^log-^. It follows from (42) that 7 < j^D{P\\Q). 

We shall apply Lemma 6 to argue a contradiction. As a witness to the nonexistence of 9n 
satisfying (38) and (39) we show that if 0^ = 7 then neither (38) nor (39) holds. By Lemma 2, 
D{P\\Q) X D{Q\\P). Since 0 < 7 < 2 :}^X(P||( 5 ), choosing <5 > 0 to be a sufficiently small 
constant ensures that both 7 and 7 + SD{Q\\P) lie in [—D{Q\\P), D{P\\Q)]. Then Assumption 1 
and Corollary 5 yield: 


Q 


K-l 


^X>(X-1)7 


2 = 1 


> exp — 


(X-l)XQ (7 + JX(Q||P))+log2 


1 - 


_ c _ 

(K-l)S--‘D{Q\\P) 
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By the properties of Eg discussed in Remark 3, 


and by Lemma 2, 


Eg (7 + 5D{Q\\P)) < Eg{j) + 6D{Q\\P), 


6D{Q\\P) < 26CEg{0) < 26CEg{^), (44) 

so, in view of (43), if 6 is sufficiently small, 

{K-l)Eg{j + 6D{Q\\P)) < (1-2,5) log n 

for all sufficiently large n. Also, recall that D(P\\Q) x D(Q\\P) and hence (42) implies that 
KD{Q\\P) oo. Therefore, 


Q 


K-l 




2 = 1 


> n 


- 1+5 


for all sufficiently large n. Thus, (39) does not hold for 6 ^ = 7 . 


Turning to (38) (with = 7 ), we let Kg = K/\ogK and 

^ {Kg-l){D{P\\Q)-^) + Q<j 
{K-Kg)D{P\\Q) 

where a = varp[L]. Note that varp[L] = V’q(I) ^ CD{P\\Q) by Assumption 1 and recall that from 
(42) we have 7 < i^^^D{P\\Q). Furthermore, since K ^ 00 and KD{P\\Q) ^ 00 by (42), we 
conclude that 6 ' = o(l). 

Since D{P\\Q) x D{Q\\P) and 0 < 7 < 2 :^T>(P||Q), choosing 5 to be a sufficiently small 
constant ensures that both 7 — S'D{P\\Q) and 7 — (,5' + 6)D(P\\Q) lie in [—D{Q\\P), D{P\\Q)]. 
Hence, applying Corollary 5 yields 


P 


= P 


K-Ko 


U<{K- 1)7 - {Kg - l)D{P\\Q) - Qa 


2 = 1 
K-Ko 


Y U<{K-Kg){^-5'D{P\\Q)) 


. 2 = 1 


> exp — 


{K - Kg)Ep (7 - (5' + 6)D{P\\Q)) + log 2 


1 - 


C 

{K-KoWD{P\\Q) 


Moreover, in view of the fact that Ep{-) is decreasing and (23), 

{l-e^fD{P\\Q) 


Let C = 


Ep{i) > Ep {D{P\\Q)/{2 - eo)) > ^(3 _ 

2 I 2 -IIY ' similar to the properties of Eg discussed in Remark 3, 

Ep (7 - {5' + 5)D{P\\Q)) < Ep{^) + {5' + 5)D{P\\Q) 

<Ep{^){l + {5' + 5)/C'). 


(45) 


(46) 
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Since Ep{'y) = Eq{'^) — 7 , by (43), there exist some e > 0 such that 

KEp{-y) < (1 — e) logn — log(n/it') = —elogn + log if < (1 — e) log if. 
Thus by choosing 6 sufficiently small and in view of S' = o(l), 

(K - Ko)Ep (^ - (S'+ S)B(FIIQ)) < (l-2e')logif 
for some e' > 0. Recall that KD(P\\Q) —>■ 00 , it readily follows from (45) that 


P 


'K-Ko 

^ L, <(if-l)7-(ifo 

_ 


l)D(P\\Q)-Qa 


> if ■ 


i+f 


Thus, with On = 7 , neither (38) nor (39) holds for all sufficiently large n. Therefore, there does not 
exist a sequence On such that both (38) and (39) hold for all sufficiently large re, contradicting the 
conclusion of Lemma 6 . □ 


Appendices 

A Equivalence of Weak Recovery in Expectation and in Probabil¬ 
ity 

Lemma 7. There exists an estimator ^ such that —>■ 0 ire probability if and only if there 

exists an estimator ^ such that _i. g. 

Proof. One direction is automatic because convergence in Li implies convergence in probability. 
Conversely, suppose —?■ 0 in probability for some (sequence of) Then there exists a 

deterministic sequence —)• 0 such that F{dH(f,,f,) > f^nK} < Cn- Define a new estimator by 

^ ^ ^ ■ '^{\i\>K+enK}' 

where 0 denotes the all-zero vector. Since |^| = if, by the triangle inequality, we have 




{|{| 


> K + e„K 


} 


< €nK E 


\i\<K+e„K} + 


> if + enif 


< CnK + 


(3if+ e„if)p{dH(e,f) > enif} 


> e„if < 4e„if + eiK. 


Therefore, 


K 


□ 


B Assumption 1 for exponential families of distributions 

There is a simple sufficient condition for Assumption 1 to hold in case P and Q are from the 
same exponential family of distributions (including Bernoulli, Gaussian, etc). Consider a canonical 
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exponential family with the following pdf (with respect to some dominating measure):® 

pe{x) = h{x) exp(0r(x) — A{9)), 

where A is a convex function. Then E£i[T] = A'(0) and var 5 )[r] = A''{9). Assume that P and Q 
correspond to parameters 9i and 9q, respectively. It could be that 9q < 9i or 9i < 9o; let I denote 
the interval with endpoints 9q and 9i and J denote the interval with endpoints 0o=t {9i — 9 q). Then 
Q\ has parameter A^i + A0o- Furthermore, 


L={9i-9^)T-A{9i) + A{9o) 

D{P\\Q) = A{9i) - A{9o) - (01 - 0o)^'(0o) 

C(P,Q)= -mmA(9) 
eel 

= A{X9i + X9o) - XA{9o) - XA{9i) 

V^^(A) = A"(A0i + A0o)(0i-0o)'. 

By Taylor’s theorem, D{P\\Q) is times a weighted average of A!' over /: 


D{P\\Q) 


(01 - 0o)^ feo - ^o)ds 

2 (01 - 0o)V2 


Similarly, D{Q\\P) is a weighted average of A" over /. Therefore, a sufficient condition for As¬ 
sumption 1 is 


maxegjA"(0) ^ 
min,6r^"(0) ^ ^ 


(47) 


Examples: 


1. Gaussian: 9 = p, A{9) = 9“^I2 and A"{9) = 1. So (1) holds in the Gaussian case with no 
extra assumption. 


2. Bernoulli: 0 = log A(0) = log(l-|-e®) and A"(0) = = p{^—p)- We shall show that if 

p, q vary such that p,q £ (0,1) with p ^ q, then (47) is equivalent to boundedness of the LLR. 
By symmetry between 0 and 1 we can assume without loss of generality that 0 < q < p < 1. 
First, if p < 1/2 the LHS of (47) is ^ | and if p G [1/2,1 — e] for some fixed e > 0 then the 

LHS of (47) has size Q{l/q) = Q{p/q)- So the claim is true if p is bounded away from one. 

If p —)• 1 and q yP 1 then both the LHS of (47) and the LLR are unbounded, so the claim is 


again true. 

It remains to check the case p,q —)• 1. The denominator of the LHS of (47) is pp x p. The 
maximum in the numerator is taken over the interval [0_i,0i], where 0_i = 0o — [0i — 0o] = 
log If 0-1 < 0 (i-e- 00 < 0i/2) then the numerator of the LHS of (47) is 1/4, so (47) 

fails to hold, and also, | = 0{y/p) so the LLR is unbounded. R thus remains to consider 
the case 0i/2 < 0o < 0i with 0i —>■ oo. The numerator of the LHS of (47) is rf where r is 
determined by 0_i = logp, or, equivalently, p = Hence f x The LHS of (47) is 

^ X p X ^1^ while the maximum absolute value of the LLR is ©(log |). Hence, again, (47) 
holds if and only if the LLR is bounded. The claim is proved. 


®For simplicity we assume T and 9 are scaler valued. Vector values would give pe{x) = h{x) exp((0, T{x)) — A(9)) 
and the condition (47), with A"(9) replaced by (9i — 9o)^II(9)(9i — 9o), where H is the Hessian of A, and I and J 
becoming line segments, is still sufficient for Assumption 1. 
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C Proof of Corollary 4 


In the Gaussian case, Eq{9) = Throughout this proof, let 9 = -^log^ and let / be 

the function defined by /(/r) = Eq{9) = Consider the equation It yields 

a quadratic equation in //^: ^2 _|_ 4log^n/K) _ g solutions namely 

E± = ^ (Vlogn ± \/log K)^. Without loss of generality, we take > 0 and > 0; the case of 
/i+ < 0 and jjL- < 0 follows analogously. In summary, the expressions inside the liminf in both (13) 
and (19) are one if /r is replaced by /i+. 

For the sufficiency part, suppose ^ depends on n such that (11) and (19) hold. By (19), for 
e > 0 sufficiently small, //(I — e) > for all sufficiently large n. We can also take e < 1/10. By 
(11), limsup A < 1 so uniformly for (1 — e)/r < x < fi, 


fix) = ^ ( X + —) ( 1 - 


X J 


29 


x^ 


1 


> 7 ((1 - e)Ai) 1 - 


29 


(1 — e)2/i2 


= 0(/r). 


Also, < 1 so f{x) > 0 for X > /r+. Hence, 


ff) 

ff+) 


- 1 > 


ff) - /(Ai(l - e)) 


ff+) 


K 


= n 


log n 
eK^f 


log n 


f\x)dx 

= m, 


where for the last equality we use Therefore (13) holds, sufficiency follows from 

Theorem 2. 

For the necessity part, it suffices to show that (12) and (14) imply (20). If iF < then (12) 
alone implies (20), so we can also assume that K > It follows that ^ 

Therefore, for e G (0,0.1), 

/(/^+(1 - e)) < ff+) - min{/'(x) : (1 - e)^+ < x < //+} 


</(/r+)-^(l-6)M+ (^1- 

</(//+)-11(6^^) <i^(l-F!(e)). 


1 


2(1 - e)^ 


In view of (14) it follows that /i > ^+(1 — e) for all sufficiently large n. Since e can be arbitrarily 
small, (20) follows. 
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